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Abstract 

Introduction: A key aim of triage is to identify those with high risk of cardiac arrest, as they require intensive 
monitoring, resuscitation facilities, and early intervention. We aim to validate a novel machine learning (ML) score 
incorporating heart rate variability (HRV) for triage of critically ill patients presenting to the emergency department 
by comparing the area under the curve, sensitivity and specificity with the modified early warning score (MEWS). 

Methods: We conducted a prospective observational study of critically ill patients (Patient Acuity Category Scale 1 
and 2) in an emergency department of a tertiary hospital. At presentation, HRV parameters generated from a 5- 
minute electrocardiogram recording are incorporated with age and vital signs to generate the ML score for each 
patient. The patients are then followed up for outcomes of cardiac arrest or death. 

Results: From June 2006 to June 2008 we enrolled 925 patients. The area under the receiver operating 
characteristic curve (AUROC) for ML scores in predicting cardiac arrest within 72 hours is 0.781, compared with 
0.680 for MEWS (difference in AUROC: 0.101, 95% confidence interval: 0.006 to 0.197). As for in-hospital death, the 
area under the curve for ML score is 0.741, compared with 0.693 for MEWS (difference in AUROC: 0.048, 95% 
confidence interval: -0.023 to 0.1 19). A cutoff ML score > 60 predicted cardiac arrest with a sensitivity of 84.1%, 
specificity of 72.3% and negative predictive value of 98.8%. A cutoff MEWS > 3 predicted cardiac arrest with a 
sensitivity of 74.4%, specificity of 54.2% and negative predictive value of 97.8%. 

Conclusion: We found ML scores to be more accurate than the MEWS in predicting cardiac arrest within 72 hours. 
There is potential to develop bedside devices for risk stratification based on cardiac arrest prediction. 



Introduction 

In the emergency department (ED), triage is used to 
assess the severity of patients' conditions and to assign 
appropriate treatment priorities. This clinical process 
entails the rapid screening of large numbers of patients 
to assess severity and assign treatment. Risk stratification 
for cardiac arrest and other adverse cardiac outcomes 
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plays an essential role in the management of chest pain 
patients in the ED [1]. Medical decisions for disposition 
as well as the required level of intensive monitoring rest 
on this perceived risk [2] . Risk stratification is a necessity 
because medical resources are never sufficient for all 
patients to be attended instantaneously in busy EDs and 
hospitals, with limited numbers of doctors, nurses, moni- 
tored beds, resuscitation facilities, intensive care beds, 
operating theaters, and so forth. Quick identification of 
patients of higher severity, who would more urgently 



© 201 2 Ong et al.; licensee BioMed Central ftd. This is an open access article distributed under the terms of the Creative Commons 
BiolVlGCl C^ntrBl Attribution License ;http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in 
any medium, provided the original work is properly cited. 



Hock Ong et al. Critical Care 2012, 16:R108 
http://ccforum.eom/content/1 6/3/R1 08 



Page 2 of 12 



need and potentially benefit from such resources, is thus 
important. 

Current risk-stratification systems are based on clinical 
judgment and traditional vital signs including heart rate, 
respiratory rate, blood pressure, temperature, and pulse 
oximetry [3]. Unfortunately vital signs have not been 
shown to correlate well with short-term or long-term 
clinical outcomes [4], The modified early warning score 
(MEWS) is one such widely used tool (Table 1). The 
MEWS is based on physiological parameters: systolic 
blood pressure, pulse rate, respiratory rate, temperature 
and AVPU score (A for 'alert', V for 'reacting to vocal sti- 
muli', P for 'reacting to pain', U for 'unconscious'). We 
selected the MEWS as our comparator tool because it is 
widely used in the UK and in Commonwealth countries 
to identify patients at risk of deterioration, and raised 
MEWS values are associated with increased mortality [5]. 
The MEWS can be relatively quickly calculated during 
triage, without the need for laboratory test results, for 
example. Other studies carried out in the UK have shown 
good results in predicting poor outcomes in their patient 
groups [5-7]. However, the assessment of the AVPU 
score is a relatively subjective element in the scoring. 
Also, the range of sensitivities and specificities are depen- 
dent on the cutoff score used and the MEWS requires 
some training to be accurate. 

Heart rate variability (HRV) is a non-invasive measure- 
ment for investigating autonomic influence on the cardio- 
vascular system that has generated significant interest in 
recent scientific literature [8]. HRV may be defined as the 
change in the time interval between heartbeats, from beat 
to beat. HRV is controlled by the autonomic nervous sys- 
tem, including the sympathetic nervous system and the 
parasympathetic nervous system [9,10]. There is recogni- 
tion of a significant relationship between the autonomic 
nervous system and cardiovascular mortality, including 
sudden cardiac death [11,12]. Recent studies have found 
strong associations between HRV from short-term (2 to 
30 minutes) electrocardiogram (ECG) recordings and 
post-acute myocardial infarction mortality [13,14]. These 
associations suggest that short-term HRV measurements 



may serve as a rapid risk-stratification tool for adverse car- 
diac events. 

Machine learning (ML) is based on the way the human 
brain approaches pattern recognition tasks, providing an 
artificial intelligence-based approach to solve classification 
problems. A model is learned during the training process 
using previously known input-output pairs. The trained 
model is then tested with new data. Various ML topolo- 
gies are available, including single-layer and multi-layer 
feedforward networks. ML adjusts weights of hidden layers 
during training to minimize an error function [15]. 

In this study, we aim to validate a novel ML score 
incorporating HRV for risk stratification of critically ill 
patients presenting to the ED by comparing the area 
under the curve, sensitivity and specificity for prediction 
of cardiac arrest with the MEWS. 

Materials and methods 

Study design 

We conducted a prospective, nonrandomized, observa- 
tional cohort study, looking at critically ill patients 
attended by the Singapore General Hospital ED. Singa- 
pore General Hospital is the oldest and largest acute ter- 
tiary hospital in Singapore. The hospital accounts for 
about one-third of all acute-care public-sector beds and 
about one-quarter of acute beds nationwide. It is a Level 
1 Trauma Centre for Singapore. Annually, about 60,000 
patients are admitted to its wards and another 600,000 
patients are attended to in its Specialist Outpatient 
Clinics. The ED sees between 300 and 500 patients a day. 

All public hospitals in Singapore use a national Patient 
Acuity Category Scale (PACS) for triage at the ED. PAC 
1 patients are the most critically ill and would therefore 
be required to be attended to without delay. They 
would most probably require maximum allocation of 
staff and equipment resources for initial management. 
The severity of their symptoms requires very early atten- 
tion, failing which early deterioration of their medical 
status is likely. PAC 2 patients are nonambulant and 
would appear to be in a stable state on initial cardiovas- 
cular examination and are not in danger of imminent 



Table 1 Modified early warning score 


Score Respiratory rate (breaths/minute) 


Heart rate (beats/minute) 


Systolic blood pressure (mmHg) 


Temperature (°C) 


AVPU 


3 




< 70 






2 < 8 


< 40 


71 to 80 


< 35 




1 


41 to 50 


81 to 100 


35.1 to 36 




0 9 to 14 


51 to 100 


101 to 199 


36.1 to 38 


Alert 


1 1 5 to 20 


101 to 110 




38.1 to 38.5 


Reacting to voice 


2 21 to 29 


1 1 1 to 1 29 


> 200 


> 38.6 


Reacting to pain 


3 > 29 


> 129 






Unresponsive 



AVPU, A for 'alert', V for 'reacting to vocal stimuli', P for 'reacting to pain', U for 'unconscious'. 
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collapse. PACS 3 patients are ambulant and PACS 4 
patients are nonemergencies. PACS is a symptom-based 
triage system and does not have strict physiological cri- 
teria (for example, vital sign cutoff values). 

At Singapore General Hospital ED, all patients are 
initially triaged by a nurse, and those with airway, 
breathing and circulation problems, or those thought to 
be possibly unstable and needing close monitoring, are 
routinely put on ECG monitoring using the LIFEPAK® 
12 defibrillator/monitor (Physio-Control, Redmond, 
WA, USA). These would be PACS 1 patients and some 
PACS 2 patients. 

Patient recruitment and eligibility 

All patients older than 18 years of age requiring contin- 
uous ECG monitoring with PACS 1 or PACS 2 were eli- 
gible. Patients in nonsinus rhythm (asystole, 
supraventricular arrhythmias, ventricular arrhythmias, 
complete heart block, and pacemaker rhythm) were 
excluded because HRV metrics are not reliable for non- 
sinus rhythms. Patients who were subsequently dis- 
charged against medical advice or transferred to another 
hospital for care were considered lost to follow-up and 
excluded from the study as clinical outcomes could not 
be determined. We also excluded cases with a high per- 
centage of artifacts, nonsinus beats, and ectopics com- 
bined together (> 30% of recorded tracing); cases with < 
30% of artifacts and so forth were included, but the 
nonsinus segments of the tracings were trimmed off. 
Patients were only recruited during office hours. The 
initial set of vital signs and HRV parameters obtained 
during triage was recorded for this study. HRV record- 
ings ranged from 5 to 30 minutes. Ethics approval was 
obtained from the Singhealth Centralised Institutional 
Review Board (CIRB Approval No. 2006/018/C) with a 
waiver of patient consent for the study. Patients were 
recruited from June 2006 to June 2008. 

Hospital outcomes 

The primary outcome was cardiac arrest within 72 
hours of presentation to the ED. The event of cardiac 
arrest was defined as sudden unexpected death or a 
resuscitation event requiring cardiopulmonary assistance 
(chest compressions and/or defibrillation). This assis- 
tance was thought to probably reflect a primary cardiac 
event/etiology. Information regarding the nature of 
death or the resuscitation event was extracted from clin- 
ical notes. The secondary outcome was death after 
admission (in-hospital death during current admission, 
including within 72 hours). This endpoint would have 
included patients dying from primary cardiac as well as 
noncardiac etiologies. Patients were followed up until 
discharge or in-hospital death. Information regarding 
the nature of death was extracted from clinical notes. 



For patients who were discharged before 72 hours, elec- 
tronic medical records providing information on admis- 
sion to all public hospitals in Singapore were reviewed 
for study outcomes. 

Data collection and processing 

ECG tracings (long lead I, II, III and 12-lead ECG data) 
obtained during initial presentation from critically ill 
patients on a LIFEPAK 12 defibrillator/monitor were 
downloaded using the CODE-STAT Suite data review 
software (version 5.0; Physio-control). Lead II ECGs 
sampled at 125 Hz were extracted as text files for HRV 
analysis using CODE-STAT™ and proprietary ECG 
extraction software (Physio-Control); 125 Hz is the sam- 
pling rate used by the defibrillator monitor. Since we 
are primarily looking at the QRS complexes and not 
interested in high-frequency features of the ECG, this is 
a sufficient rate of digitization. Cases with ECG record- 
ings were prospectively identified and had identity con- 
firmed by querying ED charts and records. A minimum 
ECG recording of 5 minutes is required in order to 
accurately calculate HRV metrics. 

The ECG records were converted into text (ASCII) 
files using an extraction program available with CODE- 
STAT. The processing program was embedded with a 
MATLAB code (R2009a; The Mathworks, Natick, Mas- 
sachusetts, USA), which was used to process the ECG 
signals to obtain the HRV variables (see Figure 1), in 
accordance with the guidelines outlined by the Task- 
force of the European Society of Cardiology [16]. The 
raw ECG data were first preprocessed to reduce the 
effects of noise and artifacts using a 5 to 28 Hz band- 
pass filter. This frequency range has been found to 
enhance the QRS complex against the background noise 
for easier peak detection [17]. A modified threshold- 
plus-derivative method was used to detect the QRS 
complexes, and all ectopics and other nonsinus beats 
were excluded in accordance with the guidelines out- 
lined by the Taskforce of the European Society of Cardi- 
ology [16], using an automatic detection algorithm. RR 
intervals were then calculated based on the sinus 
rhythm. The beat detection and labeling techniques 
have previously been validated against manually anno- 
tated data from the MIT-BIH database [18] and have 
been found to perform with high accuracy [19]. Ectopic 
beats were identified by the size and shape of the QRS 
complexes as well as the distances between successive 
beats. The height of the QRS complex, width, and RR 
interval were also considered. In addition, atrial fibrilla- 
tion was identified manually by study engineers during 
retrospective verification of ECGs. 

The ECG tracings were then analyzed for heart rate 
variability, with both time-domain and frequency- 
domain analyses. Other variables collected were age, 
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Para mo+a r 
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UCoLI 1 \J I V [ 1 


RR interval 


The time interval in seconds between each QRS beat on 




the electrocardiogram 


Time domain measures 




aRRandSTD (s) 


Mean and standard deviation of the RR intervals 




Root mean square of differences between adjacent RR 


RMSSD(s) 


intervals 


NN50 (count) and pNN50 (%) 


Number and percentage of consecutive RR intervals 




differing by more than 50ms 


Geometric measures 




H RV triangular index 
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histogram using a least squares technique 


Frequencv domain measures 




VLF (ms 2 ) 


Power in the very low frequency range (< 0 04 Hz) 


LF (ms 2 ) 


Power in the low frequency range (0 04 - 0 1 5 Hz) 


HF (ms 2 ) 


Power in the high frequency range (0.1 5-0 40 Hz) 


LFnorm (nu) 


LF power in normalized units: LF/(TP-VLF) x 1 00 


HFnorm (nu) 


HF powerin normalized units HF;(TP-VLF)x 100 


LF/HF ratio 


Ratio of LF power to HF power 


Total Power (ms 2 ) 


Variance of all RR intervals (<0 4 Hz) 



Figure 1 List of heart rate variability parameters 



gender, medical history including ischemic heart disease, 
diabetes mellitus and chronic renal failure, heart rate, 
blood pressure, respiratory rate, Glasgow Coma Scale 
(GCS), etiology, and medication history. Vital signs 
(heart rate, blood pressure, and oxygen saturation 
(Sp0 2 )) were measured using the Propaq CS Monitor 
(Welch Allyn, Skaneateles Falls, NY, USA) vital signs 
monitor in the ED. The GCS and respiratory rate were 
recorded at the time of vital sign measurement. AVPU 
scores were recorded at triage. Tympanic temperatures 
of the patients were taken using a tympanic thermo- 
meter. AVPU scores were scored according to the best 
response during data collection. The collected data were 
used to calculate a MEWS for each patient recruited. 

HRV variables measured included time-domain, fre- 
quency-domain, and geometric parameters. The fre- 
quency-domain parameters were calculated based on 
estimates of power spectral density, obtained using the 
Lomb-Scargle periodogram that is commonly used for 
unevenly sampled sequences. Use of the Lomb-Scargle 
periodogram eliminates the need for interpolation or 
resampling of the sequences [20,21]. 

Machine learning score prediction 

A ML-based prediction model - utilizing age, HRV para- 
meters, and vital signs - was proposed to compute risk 
score on patient's hospital outcome [22]. This model was 
run on a MATLAB code (R2009a; The Math works). In 
contrast to traditional mathematical logistic regression 
approaches, this is a multivariate, nonparametric, black- 
box approach. This approach overcomes problems faced 



by traditional statistical models of colinearity and overfit- 
ting. Assuming that each patient's data can be repre- 
sented as a vector of HRV parameters and vital signs, the 
scoring system is built based on the calculation of geo- 
metric distances among a set of feature vectors (that is, 
multiple patients). Classifier selection plays a key role in 
building an efficient prediction system. In this study, the 
support vector machine was adopted to map feature vec- 
tors onto a higher dimensional space and find an optimal 
pattern- separating hyperplane [23,24]. The support vec- 
tor machine has shown satisfactory performances in 
many areas including ECG beat classification [25], EEG 
analysis [26], and text classification [27]. 

The calculation of the ML score is straightforward. 
First, cluster centers of both positive and negative sam- 
ples are calculated in Euclidean space, where positive 
samples are patients with cardiac arrest or death as out- 
comes and negative samples are patients without the 
above outcomes. A score is then computed by evaluating 
Euclidean distances between a patient's data and both 
cluster centers. Last, the risk score is fine-tuned through 
a novel imbalanced learning strategy. If the predicted 
outcome is positive, the risk score will be increased. 

As shown in Table 2 the database consists of a majority 
group of normal samples and a minority group of sam- 
ples with abnormal outcomes (cardiac arrest or death); 
that is, the dataset is imbalanced. Common ML algo- 
rithms cannot be directly implemented on this imbal- 
anced database for score tuning, because the majority 
class will dominate the learning process and leads to 
poor generalization performance on new patients from 
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Table 2 Baseline characteristics of study patients 



Characteristic 



No cardiac arrest within 72 hours (n = 882) 



Cardiac arrest within 72 hours (n = 43) 



P value 3 



Age (years) 

Median (IQR) 62 (50 to 74) 

Male gender 542 (61.5) 
Race 

Chinese 586 (66.4) 

Malay 130(14.7) 

Indian 118(13.4) 

Others 48 (5.4) 

Diagnosis grouping b 

Cardiovascular 359 (40.7) 

Respiratory 137(15.5) 

Neurological 87 (9.9) 

Gastrointestinal 46 (5.2) 

Renal 25 (2.8) 

Endocrine 58 (6.6) 

Infectious diseases 59 (6.7) 

Vascular 22 (2.5) 

Trauma 31 (3.5) 

Cancer 28 (3.2) 

Others 30 (3.4) 

Medical history 11 

Diabetes 295 (33.4) 

Hypertension 472 (53.5) 

Heart disease 292 (33.1) 

Renal disease 115(13.0) 

Respiratory disease 103(11.7) 

Stroke 63(7.1) 

Cancer 69 (7.8) 

Others 523 (59.3) 

Prior medical therapy d 

Beta-blockers 227 (25.7) 

Calcium-channel blockers 165 (18.7) 

Digoxin 36(4.1) 

Amiodarone 12 (1.4) 

Other anti-arrythmics 5(1.4) 



70 (59 to 78) 

31 (72.1) 

35 (81.4) 
5 (11.6) 

1 (2.3) 

2 (4.7) 

9 (20.9) 

10 (23.3) 
4 (9.3) 

0 (0) 

0 (0) 

4 (9.3) 

1 (2.3) 

3 (7.0) 

2 (4.7) 
9 (20.9) 

1 (2.3) 

5 (11.6) 
18 (41.9) 
1 2 (27.9) 

6 (14.0) 

7 (16.3) 
0 (0) 

2 (4.7) 

32 (74.4) 

14 (32.6) 

11 (25.6) 
2 (4.7) 

0 (0) 
0 (0) 



0.018 
0.198 



0.134 



0.275 



0.405 
0.876 
0.868 
0.251 
0.222 
1.000 
1.000 
0.204 

0.726 
0.432 
0.415 
1.000 
1.000 



Data shown are numbers (%) unless otherwise stated. IQR, interquartile range (25th to 75th percentiles). a P value from either the chi-square test or the Mann- 
Whitney test as appropriate. b Based on admitting emergency physician clinical diagnosis. c Medical history at presentation to the emergency department. d Prior 
outpatient medical therapy at presentation to the emergency department. 



the minority class. The solution to handling data imbal- 
ance is to create a decision ensemble [28]. Our method 
partitions the samples of majority class into N nonover- 
lapped groups, with each group joined by minority sam- 
ples. By doing so, N balanced datasets are created, on 
which a prediction model is trained to distinguish minor- 
ity and majority classes. The ML algorithm was trained 
and validated using a leave-out-one strategy. 

Statistical analyses 

Continuous variables are presented as means (standard 
deviation) or medians (interquartile range) and were 
analyzed using a two-tailed Student's t test and the 



Wilcoxon rank-sum test, respectively. Categorical vari- 
ables are presented as numbers (percentage) and were 
analyzed using the chi-square test or Fischer's exact test 
when appropriate. 

The ML score and the MEWS were calculated for all 
patients and analyzed for a significant association 
between the scores and the incidence of cardiac arrest 
or death, and adverse cardiac events. Receiver operating 
characteristic (ROC) curves were based on the continu- 
ous measurements of the ML score and the ordinal 
measurements of the MEWS. 

Patients were categorized into low, intermediate, and 
high risk groups according to their ML scores, based on 
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review of the data (selecting the cutoff values that pro- 
vided the best discrimination): low risk, ML score 0 to 
40; intermediate risk, ML score 41 to 60; and high risk, 
ML score 61 to 100. 

The area under the receiver operating characteristic 
curve (AUROC) for the ML score and the MEWS was 
calculated and compared using a z-statistic made from 
the difference between the two AUROCs divided by the 
standard error of the difference in the AUROCs [29]. 
The confidence interval (CI) for reporting the difference 
between the two AUROCs was also derived. Statistical 
comparison tests between the ML score and the MEWS 
for sensitivity and specificity were done by applying the 
McNemar test to the disease group (comparison of sen- 
sitivities) and the nondiseased group (comparison of 
specificities) [30]. Similarly, statistical comparison tests 
between ML and the MEWS for the predictive values 
were done using the above same method [30], whereby 
the test statistics had a chi-square distribution with one 
degree of freedom. A statistical comparison test for the 
likelihood ratio of a positive test was not performed, 
however, because no well-established method of com- 
parison was found in the literature. Whenever possible, 
the 95% CI for the difference in diagnostic value 
between the two scoring methods was provided. For the 
predictive values, the CI of the difference was not read- 
ily computable by well-established methods, but signifi- 
cance tests were carried out for these comparisons. 
Moreover, separate CIs of the predictive value within 
each scoring method are presented. A general rule of 
thumb is that CIs can overlap as much as 29% and the 
statistics can still be significantly different (see [31] 
chapter 2.6: overlapping confidence intervals do not 
imply nonsignificance). The ML score was further cate- 
gorized into low, intermediate, and high risk scores and 
was tested for a significant relationship with the rates of 
cardiac arrest or death. Optimum cutoff points were 
determined using sensitivity and specificity analysis. 

Unless otherwise specified, P < 0.05 was considered to 
indicate statistical significance. All data were stored with 
Excel (Microsoft Office 2007; Microsoft, Redmond, WA, 
USA) and imported into SPSS software (version 17.0; 
SPSS Inc., Chicago, IL, USA) and STATA software (ver- 
sion 11.1; STATA Corporation, College Station, TX, 
USA) for statistical analysis. 

Results 

Baseline characteristics 

A total of 1,025 ECG tracings were collected during this 
period. Out of these tracings, 100 were excluded due to 
a high percentage of artifacts, nonsinus beats, ectopics, 
or missing data. A total of 925 patients were recruited 
during the study. Table 2 shows the characteristics of 
the study patients. The diagnosis grouping shown was 



based on the admitting emergency physician clinical 
diagnosis. The largest diagnosis grouping is the cardio- 
vascular group at 468 (50.6%), followed by the respira- 
tory group at 147 (15.9%). 

Outcomes 

Forty-three (4.6%) of the total sample developed cardiac 
arrest within 72 hours, while 86 (9.3%) died after admis- 
sion (including those deaths within 72 hours). The 
respiratory diagnosis group had the largest number of 
primary outcomes at 10 (23.3%), followed by cardiovas- 
cular and cancer groups both at nine incidences (20.9%). 
Both the gastrointestinal and renal groups did not have 
any patients with primary outcome (cardiac arrest). The 
respiratory diagnosis group has the largest number of 
secondary outcomes at 19 (22.1%), followed by the car- 
diovascular diagnosis group at 18 (20.9%). 

Table 3 shows the relationship of the predictor factors 
with the outcome of cardiac arrest within 72 hours and 
death after admission. Those factors found to have sig- 
nificant association (P < 0.05) with the primary outcome 
included the GCS, pulse rate, respiratory rate, Sp0 2 , 
aRR, avHR, sdHR, RR triangular index, LS-VLF power, 
LS-HF power, LS-LF norm, LS-HF norm, and MEWS. 
Those factors found to have significant association (P < 
0.05) with the secondary outcome included the GCS, 
respiratory rate, Sp0 2 , aRR, avHR, RMSDD, RR triangu- 
lar index, TINN, LS-VLF power, LS-HF power, LS-LF 
norm, LS-HF norm, LF/HF ratio, and MEWS. 

The ROC and AUCs of the ML score and the MEWS 
for predicting cardiac arrest within 72 hours or death 
after admission are illustrated in Figure 2 and 3, 
respectively. 

Eighty-nine patients (9.6%), 576 patients (62.3%), and 
260 patients (28.1%) were in the low, intermediate, and 
high risk ML score groups, respectively. Rates of cardiac 
arrest within 72 hours were 0%, 1.6% (95% CI: 6.59 to 
9.79), and 13.1% (95% CI: 1.75 to 24.45) in the low, 
intermediate, and high risk groups, respectively, as 
shown in Figure 4. Rates of death after admission were 
2.3% (95% CI: 18.48 to 23.08), 29.1% (95% CI: 11.30 to 
46.90), and 68.6% (95% CI: 56.76 to 80.44) in the low, 
intermediate, and high risk groups, respectively. 

Table 4 shows the sensitivity, specificity, positive pre- 
dictive values, and negative predictive values for the ML 
score and the MEWS for predicting cardiac arrest 
within 72 hours or death after admission. The AUROC 
of the ML score was higher compared with the MEWS 
for the primary outcome of cardiac arrest (0.781 vs. 
0.680, difference in AUROC: 0.101, 95% CI: 0.006 to 
0.197; P = 0.037) but not for the secondary outcome of 
death. For prediction of cardiac arrest within 72 hours 
after presentation, the sensitivity and specificity of the 
ML score were 81.4 and 72.3, respectively, compared 
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Table 3 Measurements of MEWS, vital signs and heart rate variability of study patients 



Variable 


No cardiac arrest 
within 72 hours 
(n = 882) 


Cardiac arrest 
w i in i ii / z riuurs 
(n = 43) 


P value 3 


No death (n = 839) 


Death (n = 86) 


P value 3 


Age 


61 (16) 


66 (16) 


0.047 


61 (16) 


69 (16) 


< 0.001 


Vital signs 














Glasgow Coma Scale 


15 (15 to 15) 


15 (1 1 to 15) 


0.002 


15 (15 to 15) 


15 (10 to 15) 


< 0.001 


Temperature (°C) 


37 (1) 


37 (1) 


0.1 18 


37 (1) 


37 (1) 


0.280 


Pulse rate (beats/minute) 


96 (30) 


1 06 (25) 


0.026 


96 (30) 


102 (25) 


0.055 


Respiratory rate (breaths/minute) 


19 (5) 


20 (5) 


0.040 


19 (5) 


20 (5) 


0.021 


Systo lie BP (mmHg) 


136 (38) 


125 (34) 


0.082 


136 (37.899) 


130 (40.761) 


0.207 


Diastolic BP (mmHg) 


77 (22) 


75 (20) 


0.460 


78 (21) 


73 (23) 


0.076 


Oxygen saturation {%) 


96 (6) 


93 (13) 


< 0.001 


96 (6) 


94 (9) 


0.001 


Pain score 


0 (0 to 3) 


0 (0 to 3) 


0.825 


0 (0 to 3) 


0 (0 to 0) 


0.010 


HRV variables 














aRR (s) 


0.718 (0.177) 


0.621 (0.149) 


< 0.001 


0.721 (0.175) 


0.644 (0.176) 


< 0.001 


STD (s) 


0.053 (0.033) 


0.057 (0.047) 


0.490 


0.054 (0.033) 


0.051 (0.044) 


0.534 


avHR (beats/minute) 


89.004 (20.566) 


102.692 (22.136) 


< 0.001 


88.582 (20.317) 


99.971 (22.975) 


< 0.001 


sdHR (beats/minute) 


6.463 (3.740) 


8.152 (5.531) 


0.005 


6.473 (3.731) 


7.204 (4.877) 


0.094 


RMSSD (ms) 


0.039 (0.041) 


0.048 (0.054) 


0.272 


0.038 (0.039) 


0.048 (0.063) 


0.184 


nn50 (count) 


571 (1,268) 


484 (748) 


0.655 


547 (1,178) 


765 (1,800) 


0.123 


pnn50 


7.048 (12.268) 


6.561 (9.667) 


0.797 


6.878 (1 1.773) 


8.460 (15.41 1) 


0.251 


RR triangular index 


2.475 (1.000) 


2.046 (0.736) 


0.006 


2.486 (1 .000) 


2.158 (0.840) 


0.004 


TINN (ms) 


0.217 (0.130) 


0.189 (0.148) 


0.170 


0.219 (0.130) 


0.184(0.142) 


0.017 


VLF power (ms 2 ) 


0.131 (0.102) 


0.099 (0.084) 


0.045 


0.133 (0.101) 


0.098 (0.097) 


0.002 


LF power (ms 2 ) 


0.057 (0.042) 


0.056 (0.045) 


0.876 


0.058 (0.043) 


0.053 (0.040) 


0.319 


HF power (ms 2 ) 


0.080 (0.070) 


0.103 (0.083) 


0.032 


0.079 (0.070) 


0.100(0.077) 


0.008 


Total power (ms 2 ) 


0.268 (0.135) 


0.259 (0.131) 


0.673 


0.269 (0.135) 


0.251 (0.137) 


0.223 


LF power (nu) 


45.430 (18.489) 


36.459 (15.798) 


0.002 


45.938 (18.586) 


35.990 (14.452) 


< 0.001 


HF power (nu) 


54.566 (18.487) 


63.541 (15.798) 


0.002 


54.058 (18.583) 


64.010 (14.452) 


< 0.001 


LF/HF ratio 


1.205 (1.284) 


0.802 (1.171) 


0.043 


1.232 (1.299) 


0.742 (0.992) 


0.001 


MEWS 


2 (1 to 4) 


4 (2 to 5) 


< 0.001 


2 (1 to 4) 


4 (2 to 5) 


< 0.001 



Data shown are mean (standard deviation) or median (interquartile range, 25th to 75th percentiles). BP, blood pressure; HRV, heart rate variability; MEWS, 
modified early warning score. a P value from either an unpaired t test or the Mann-Whitney test as appropriate. 



with sensitivity and specificity of the MEWS being 74.4 
and 54.2, respectively (difference in sensitivity: 7.0, 95% 
CI: -11.1 to 21.9; and difference in specificity: 18.1, 95% 
CI: 14.3 to 22.0). Specificity for cardiac arrest but not 
sensitivity was thus significantly higher in ML compared 
with the MEWS. The positive predictive value of the 
ML score was higher (12.5, 95% CI: 9.0 to 17.1) com- 
pared with the positive predictive value of the MEWS 
(7.4, 95% CI: 5.3 to 10.3; P < 0.001). The likelihood ratio 
of a positive test for the ML score was higher (2.94, 95% 
CI: 2.46 to 3.52) compared with the likelihood ratio of 
the MEWS (1.62, 95% CI: 1.34 to 1.96). As for predic- 
tion of death after admission, the specificity of the ML 
score (73.9) was higher compared with the specificity of 
the MEWS (55.7) (difference in specificity: 18.2; 95% CI: 
14.3 to 22.2). The positive predictive value for the ML 
score (21.5, 95% CI: 16.9 to 26.9) was higher compared 
with the positive predictive value of the MEWS (14.7, 
95% CI: 11.5 to 18.4; P < 0.001). The likelihood ratio of 



the ML score (2.67, 95% CI: 2.23 to 3.20) was higher 
compared with the likelihood ratio of the MEWS at 
(1.68, 95% CI: 1.45 to 1.94). 

Discussion 

The results of this study showed that a ML score incor- 
porating vital signs and HRV parameters is more predic- 
tive of cardiac arrest within 72 hours of presentation to 
the ED compared with the MEWS. Categorization of 
patients into low, intermediate, and high risk groups 
according to the ML score is a useful predictor of risk for 
cardiac arrest and death. The ML score represents a non- 
invasive and objective risk-stratification tool that can be 
determined immediately at presentation to the ED. As 
diagnosis in the ED setting is often time dependent, med- 
ical decisions in the ED (disposition, level of monitoring, 
aggressive management) are often made based on risk 
assessment, rather than on diagnosis. We believe this is 
the first study to show the potential of a ML model 
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1 - Specificity 

Figure 2 Machine learning score and modified early warning 
score predicting cardiac arrest within 72 hours Receiver 
operating characteristics (ROC) curve analysis of the machine 
learning (ML) score and the modified early warning score (MEWS) in 
predicting cardiac arrest within 72 hours. 



incorporating age, vital signs, and HRV for predicting 
cardiac arrest and death. 

Our study also indicates that HRV measured from 
short-term ECG recordings (5 to 30 minutes), when 
combined with vital signs, provides a useful tool for risk 
stratification in the ED. In this study, depressed HRV 
parameters were associated with early (72 hours) adverse 
cardiac events and death after admission (Table 3). This 
is consistent with the findings of previous studies that 



\ j 



suggest short-term measurement of frequency-domain 
HRV parameters is strongly associated with cardiac 
death and mortality [11-13,32]. 

Decreased HRV has been found to predict increased 
mortality in the older patient [33], and for coronary 
artery disease [14,34], post-myocardial infarction [11], 
congestive heart failure [35], and dilated cardiomyopathy 
[36]. Altered spectral HRV analysis has been found to be 
an indicator of severity in congestive heart failure [37], 
hypertension [38], coronary artery disease [39], angina 
[40], myocardial infarction [41], hypovolemia, hypoxia 
[42], chronic renal failure [43], and diabetes mellitus 
[44]. Decreased HRV has also been found in ICU 
patients following head trauma [45-49], sepsis [50], and 
septic shock [51,52]. HRV has also been used as a mar- 
ker of severity in ED patients with sepsis [53]. Depressed 
HRV may reflect a decrease in vagal activity directed to 
the heart that leads to prevalence of sympathetic 
mechanisms [54], and therefore to cardiac instability, 
which might explain the higher risk of arrhythmic 
deaths [14,55]. However, the true sympatho-vagal corre- 
lates of HRV and the mechanisms behind reduced HRV 
still remain largely unknown [56]. 

In our previous research, we proposed using a combi- 
nation of age, HRV measures, and vital signs as a pre- 
dictor of patient outcomes and demonstrated that the 
combined features present significant improvements to 
predictive accuracy, sensitivity, and specificity compared 
with using HRV alone [22,57]. As we can see from 
Table 3 not all of the vital signs were highly predictive 
when used in isolation. HRV parameters also tend to be 
highly correlated. By using a ML approach, we were 
able to overcome some of these limitations as well as 
the overfitting associated with traditional statistical 
methods [58]. We have investigated an extreme learning 
machine and a support vector machine with different 
activation/kernel functions as classifiers, and found that 
the linear support vector machine is able to provide the 
highest confidence in categorizing patients into two out- 
comes: death and survival. Furthermore, we have also 
presented a new segment-based decision-making strat- 
egy for outcome prediction [22]. 

Limitations 

Several limitations are inherent in our study. This study 
was carried out in a single-center study at a tertiary 
teaching hospital in Singapore and the results may not 
be generalizable to other settings. However, we are con- 
fident that the current dataset accurately represents the 
management of ED patients in our hospital. 

The diverse types of conditions in the patients 
recruited may have different effects on the MEWS and 
ML scores. The main diagnosis groupings are cardiovas- 
cular, respiratory, neurological, gastrointestinal, renal, 



ROC Curve 




0.0 0.2 0.4 0.6 0.8 1 0 



1 - Specificity 

Figure 3 Machine learning score and modified early warning 
score predicting death within 72 hours. Receiver operating 
characteristics (ROC) curve analysis of machine learning (ML) score 
and the modified early warning score (MEWS) in predicting death 
after admission. 
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endocrine, infectious disease, vascular, trauma, cancer, 
and others. The different diagnosis groupings may cause 
inaccuracies in predicting the ML score because the 
HRV in noncardiovascular conditions may differ from 
that in cardiovascular-related conditions. In future 
study, therefore, all of the different subgroups should be 
analyzed separately. 

Another limitation of our study is that while the ML 
score has been shown to have good internal validity, 
there is a need for external validation of the score for 
routine clinical use. 

One of the exclusion criteria of this study was the 
exclusion of patients in nonsinus rhythm. It remains to 



be seen whether the ML score will remain a good pre- 
dictor of adverse outcomes in patients with irregular 
heart rhythms. Previous studies in patients with atrial 
fibrillation have found an association between HRV and 
an increased risk for cardiac death [59], as well as recur- 
rence of atrial fibrillation [60]. Patients in atrial fibrilla- 
tion, however, only represent a minority of patients, 
estimated to be approximately 2 to 4% in patients 
between ages 60 and 79 [61] and < 1% in patients below 
the age of 55 [61]. Another limitation of our study is 
the lack of follow-up for patients discharged from the 
ED. Electronic medical records of patients that were dis- 
charged before 72 hours were also checked for any 



Table 4 Discriminatory values of the machine learning score and the modified early warning score 



Variable 


ML score (95% Cl) a 


MEWS (95% Cl) b 


Difference (95% CI for difference)' 


P value 


Cardiac arrest within 72 hours after presentation 










Area under ROC curve 


0.781 


0.680 


0.101 (0.006 to 0.197) 


0.037 


Sensitivity 


81.4 


74.4 


7.0 (-11.1 to 21.9) 


0.581 


Specificity 


72.3 


54.2 


18.1 (14.3 to 22.0) 


< 0.001 


Positive predictive value 


12.5 (9.0 to 17.1) 


7.4 (5.3 to 1 0.3) 




< 0.001 


Negative predicting value 


98.8 (97.5 to 99.4) 


97.8 (95.9 to 98.8) 




0.133 


Likelihood ratio (+) d 


2.94 (2.46 to 3.52) 


1.62 (1.34 to 1.96) 






Death after admission 8 










Area under ROC curve 


0.741 


0.693 


0.048 (-0.023 to 0.119) 


0.185 


Sensitivity 


69.8 


74.4 


-4.7 (-16.7 to 7.4) 


0.572 


Specificity 


73.9 


55.7 


18.2 (14.3 to 22.2) 


< 0.001 


Positive predictive value 


21.5 (16.9 to 26.9) 


14.7 (11.5 to 18.4) 




< 0.001 


Negative predicting value 


96.0 (94.1 to 97.3) 


95.5 (93.2 to 97.1) 




0.608 


Likelihood ratio (+) d 


2.67 (2.23 to 3.20) 


1 .68 (1 .45 to 1 .94) 







CI, confidence interval; MEWS, modified early warning score; ML, machine learning; ROC, receiver operating characteristic. a A cutoff value of 60 and above was 
used for the ML score. b A cutoff value of 3 and above was used for the MEWS. c The 95% confidence interval for the difference between the ML score and the 
MEWS for each diagnostic statistic was calculated, except for positive predictive value, negative predictive value, and likelihood ratio (+) that are not well 
established. d Likelihood ratio of a positive test. e ln-hospital death during current admission. 95% confidence interval or statistical test not computed because the 
method is not well established for the diagnostic statistics concerned. 
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admissions to public hospitals in Singapore to minimize 
missed cases of adverse cardiac events. 

The ML score is also based on a single ECG record- 
ing. Although this represents a rapid method for risk 
stratification, not much is known about the change of 
HRV variables over time. Studies have suggested that 
acute changes in HRV occur before the onset of ventri- 
cular tachycardia [62-65]. Changes over time in HRV 
have also been found to occur in the early phase of 
recovery after myocardial infarction [66-68]. Serial ana- 
lyses of changes in HRV were not performed in our 
study and should be investigated for in follow-up 
studies. 

Future studies 

The results of this study should be validated with a lar- 
ger sample size, in view of the rare outcome of cardiac 
arrest within 72 hours or death. Further studies should 
also be carried out to validate the ML score in a pro- 
spective series in patients with different diagnosis 
groupings. 

We have since developed a laptop-based prototype to 
acquire real-time signals and to process and analyze 
HRV parameters. This device incorporates ECG and 
other vital signs such as blood pressure, pulse oximetry, 
and respiratory rate, together with clinical data, for 
instantaneous, intelligent prediction of cardiac arrest 
and mortality using neural networks. In the future, we 
aim to prospectively validate the prediction scores gen- 
erated by our device with critically ill patients, including 
the assessment of the effect of ongoing treatment on 
our prediction index and survival. Further development 
is also needed to produce a stand-alone device, ready 
for clinical use and possible clinical trials. We believe 
that there exists potential for the development of bed- 
side devices capable of real-time monitoring of HRV, 
which may help physicians to identify patients at high 
risk for cardiac arrest and death. 

Conclusion 

In critically ill patients presenting to the ED, we found 
ML scores to be more accurate than the MEWS in pre- 
dicting cardiac arrest within 72 hours. The results of 
this study also suggest that initial short-term HRV mea- 
surements, in addition to vital signs, may play a role for 
early, rapid, and objective risk stratification of patients 
during triage. 

Key messages 

♦ We determined the relationship of the predictor 
factors with the outcome of cardiac arrest within 72 
hours and death after admission. Significant associa- 
tion (P < 0.05) with the primary outcome included 
the GCS, pulse rate, respiratory rate, Sp0 2 , aRR, 



avHR, sdHR, RR triangular index, LS-VLF power, 
LS-HF power, LS-LF norm, LS-HF norm, and 
MEWS. 

• Those factors found to have significant association (P 
< 0.05) with the secondary outcome included the GCS, 
respiratory rate, Sp0 2 , aRR, avHR, RMSDD, RR trian- 
gular index, TINN, LS-VLF power, LS-HF power, LS- 
LF norm, LS-HF norm, LF/HF ratio, and MEWS. 

• The AUROCs of the ML score for both primary 
and secondary outcomes (0.781 and 0.741, respec- 
tively) are higher compared with those for the 
MEWS (0.680 and 0.693, respectively). 

• The sensitivity and specificity of the ML score are 
81.4 (95% CI: 66.1 to 91.1) and 72.3 (95% CI: 69.2 to 
75.2), respectively; both are higher compared with the 
sensitivity and specificity of MEWS (74.4, 95% CI: 58.5 
to 86.0; and 54.2, 95% CI: 50.8 to 57.5, respectively). 

• In critically ill patients presenting to the ED, we 
found ML scores to be more accurate than the 
MEWS in predicting cardiac arrest within 72 hours 
and death. 

Abbreviations 

AUROC: area under the receiver operating characteristic curve; AVPU: A for 
'alert', V for 'reacting to vocal stimuli', P for 'reacting to pain', U for 
'unconscious'; CI: confidence interval; ECG: electrocardiogram; ED: emergency 
department; GCS: Glasgow Coma Scale; HRV: heart rate variability; MEWS: 
modified early warning score; ML: machine learning; PACS: Patient Acuity 
Category Scale; ROC: receiver operating characteristic; Sp0 2 : oxygen 
saturation. 
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